SEQUENCE LISTING 



GENERAL INFORMATION 

(i) APPLICANT: Koopman, Peter 

Goodfellow, Peter 

(ii) TITLE OF THE INVENTION: SOX- 9 GENE AND PROTEIN AND 

USE IN THE REGENERATION OF BONE OR CARTILAGE 

(iii) NUMBER OF SEQUENCES: 21 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Scully, Scott, Murphy & Presser 

(B) STREET: 400 Garden City Plaza 

(C) CITY: Garden City 

(D) STATE: NY 

(E) COUNTRY: U.S.A. 

(F) ZIP: 11530 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: DOS 

(D) SOFTWARE: FastSEQ Version 1.5 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/860,63 5 

(B) FILING DATE: 29 -MAY- 1997 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: AU PM9714 

(B) FILING DATE: 29 -NOV- 1994 

(A) APPLICATION NUMBER: AU PM9 835 

(B) FILING DATE: 05-DEC-1994 

(A) APPLICATION NUMBER: PCT/AU95/00799 

(B) FILING DATE: 29 -NOV- 1995 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: DiGiglio, Frank S. 

(B) REGISTRATION NUMBER: 31,346 

(C) REFERENCE/DOCKET NUMBER: 10981 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 516-742-4343 

(B) TELEFAX: 516-742-4366 

(C) TELEX: 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l 
AATTAAA 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
O (D) TOPOLOGY: linear 

3 (ii) MOLECULE TYPE: cDNA 

q (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 

KpAAAGTCCT AAAGGTGGG 



(%) INFORMATION FOR SEQ ID NO: 3: 

y. (i) SEQUENCE CHARACTERISTICS: 

^ (A) LENGTH: 19 base pairs 

S3 (B) TYPE: nucleic acid 

O (C) STRANDEDNESS: single 

H (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
TTTCAGGCAA ATAAGGCAG 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 
TGGCAATCTA ACAGATGAGA 



(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5 
TCNCAAATGT CATATATCCA 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 22 base pairs 
I- (B) TYPE: nucleic acid 

C3 (C) STRANDEDNESS: single 

C3 (D) TOPOLOGY: linear 

%4 (ii) MOLECULE TYPE: cDNA 

p (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
JfeTCCAGATT GAC T GGAAC A CA 



(pf) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 
GCAATAAGAT ACTAATATGT AGAG 



(2) INFORMATION FOR SEQ ID NO: 8: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
GTCAGCAGAA ATCCTAAAGG 



(2) INFORMATION FOR SEQ ID NO: 9: 

SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

MOLECULE TYPE: cDNA 
Q (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 
JfACTAATGCC GATGGTTAAG 

^) INFORMATION FOR SEQ ID NO: 10: 

*s1 (i) SEQUENCE CHARACTERISTICS: 
% (A) LENGTH: 20 base pairs 

(B) TYPE; nucleic acid 
W (C) STRANDEDNESS : single 

^ (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 
CGCCTCGAGG TGGCTTATCG 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 




ATCATACACA TACGATTTAG GTGAC 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
GAGGAAGTCG GTGAAGAAC 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 
Q (A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
%S (C) STRANDEDNESS: single 

U (D) TOPOLOGY: linear 



p (ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 
5lCGCTCATGC CGGAGGAGGA G 

(S) INFORMATION FOR SEQ ID NO: 14: 

P (i) SEQUENCE CHARACTERISTICS: 
** (A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 
GCAATCCCAG GGCCCACCGA C 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
TTGGAGATGA CGTCGACTGC TC 



(2) INFORMATION FOR SEQ ID NO: 16: 

SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

MOLECULE TYPE: cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
4 [ GCAGCGACGT CATCTCCAAC 

i(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: 
GCTGCTTGGA CATCCACACG T 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2249 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(i) 



(ii) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 



AGTTTCAGTC CAGGAACTTT TCTTTGCAAG AGAGACGAGG TGCAAGTGGC 50 

CCCGGTTTCG TTCTCTGTTT TCCCTCCCTC CTCCTCCGCT CCGACTCGCC 100 

TTCCCCGGGT TTAGAGCCGG CAGCTGAGAC CCGCCACCCA GCGCCTCTGC 150 

TAAGTGCCCG CCGCCGCAGC CCGGTGACGC GCCAACCTCC CCGGGAGCCG 200 

TTCGCTCGGC GTCCGCGTCC GGGCAGCTGA GGGAAGAGGA GCCCCAGCCG 250 

CCGCGGCTTC TCGCCTTTCC CGGCCACCCG CCCCCTGCCC CGGGCTCGCG 300 

TATGAATCTC CTGGACCCCT TCATGAAGAT GACCGACGAG CAGGAGAAGG 350 

GCCTGTCTGG CGCCCCCAGC CCCACCATGT CGGAGGACTC GGCTGGTTCG 400 

CCCTGTCCCT CGGGCTCCGG CTCGGACACG GAGAACACCC GGCCCCAGGA 450 

GAACACCTTC CCCAAGGGCG AGCCGGATCT GAAGAAGGAG AGCGAGGAAG 500 

n ATAAGTTCCC CGTGTGCATC CGCGAGGCGG TCAGCCAGGT GCTGAAGGGC 550 

TACGACTGGA CGCTGGTGCC CATGCCCGTG CGCGTCAACG GCTCCAGCAA 600 

I* GAACAAGCCA CACGTCAAGC GACCCATGAA CGCCTTCATG GTGTGGGCGC 650 

J3 AGGCTGCGCG CAGGAAGCTG GCAGACCAGT ACCCGCATCT GCACAACGCG 700 

K i GAGCTCAGCA AGACTCTGGG CAAGCTCTGG AGGCTGCTGA ACGAGAGCGA 750 

13 GAAGAGACCC TTCGTGGAGG AGGCGGAGCG GCTGCGCGTG CAGCACAAGA 800 

rlj AAGACCACCC CGATTACAAG TACCAGCCCC GGCGGAGGAA GTCGGTGAAG 850 

pAACGGACAAG CGGAGGCCGA AGAGGCCACG GAACAGACTC ACATCTCTCC 900 

TAATGCTATC TTCAAGGCGC TGCAAGCCGA CTCCCCACAT TCCTCCTCCG 950 

GCATGAGTGA GGTGCACTCC CCGGGCGAGC ACTCTGGGCA ATCTCAGGGT 1000 

CCGCCGACCC CACCCACCAC TCCCAAAACC GACGTGCAAG CTGGCAAAGT 1050 

TGATCTGAAG CGAGAGGGGC GCCCTCTGGC AGAGGGGGGC AGACAGCCCC 1100 

CCATCGACTT CCGCGACGTG GACATCGGTG AACTGAGCAG CGACGTCATC 1150 

TCCAACATTG AGACCTTCGA CGTCAATGAG TTTGACCAAT ACTTGCCACC 1200 

CAACGGCCAC CCAGGGGTTC CGGCCACCCA CGGCCAGGTC ACCTACACTG 1250 

GCAGTTACGG CATCAGCAGC ACCGCACCCA CCCCTGCGAC CGCGGGCCAC 13 00 

GTGTGGATGT CGAAGCAGCA GGCGCCGCCC CCTCCTCCGC AGCAGCCTCC 1350 

GCAGGCCCCG CAAGCCCCAC AGGCGCCTCC GCAGCAGCAA GCACCCCCGC 1400 



AGCAGCCGCA GGCACCCCAG CAGCAGCAGG CACACACGCT CACCACGCTG 1450 

AGCAGCGAGC CAGGCCAGTC CCAGCGAACG CACATCAAGA CGGAGCAGCT 1500 

GAGCCCCAGC CACTACAGGG AGCAGCAGCA GCACTCCCCG CAACAGATCT 1550 

CCTACAGCCC CTTCAACCTT CCTCACTACA GGCCCTCCTA CCCGCCCATC 1600 

ACCCGTTCGG AAT AC GACTA CGCTGACCAT CAGAACTCCG GCTCCTACTA 1650 

CAGTCACGCA GCCGGCCAGG GCTCAGGGCT CTACTCCACC TTCACTTACA 1700 

TGAACCCCGC GCAGCGCCCC ATGTACACCC CCATCGGTGA CACCTCCGGG 17 50 

GTCCCTTCCA TCCCGCAGAC CCACAGCCCG CAGGACTGGG AACAACCAGT 1800 

C T AC AC AC AG GTCACCAGAC CCTGAGAAGA GAAAAGCTAT GGTGACAGAG 1850 

CTGATCTTTT TTTTTTTTTT TTTTTAAAGA AGAAAAGAAA GAAACGAAAA 1900 

r-| AGAAAAAGCT GAAGGAAATC AAGAACCAAT TGAAATTCCT TTGGACACTT 1950 

'% TTTTTTTTGT CCTTTCGTTA ATTTTTAAAA GACATGTAAA GGAAGGTAAC 2000 

KGATTGCTGGG CATTCCAGGA GAGAGACTTT AAGACTTTGT CTGAGCTCAT 205 0 

J;j GACAACATAT TGCAAATGGC CGGGCCACTC GTGGCCAGAC GGACAGCACT 2100 

^CCTGGCCAGA TGGACCCACC AGTATCAGCG AGGAGGGGCT TGTCTCCTTC 2150 

J-j AGAGTTAACA TGGAGGACGA TTGGAGAATC TCCCTGCCTG TTTGGACTTT 2200 

GTAATTATTT TTTAGCCGTA ATTAAAGAAA AAAAAAGTCC AAAAAAAAA 2249 

H2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 507 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Met Asn Leu Leu Asp Pro Phe Met Lys Met Thr Asp Glu Gin Glu Lys 
15 10 15 

Gly Leu Ser Gly Ala Pro Ser Pro Thr Met Ser Glu Asp Ser Ala Gly 
20 25 30 

Ser Pro Cys Pro Ser Gly Ser Gly Ser Asp Thr Glu Asn Thr Arg Pro 
35 40 45 



Gin Glu Asn Thr Phe Pro Lys Gly Glu Pro Asp Leu Lys Lys Glu Ser 
50 55 60 

Glu Glu Asp Lys Phe Pro Val Cys lie Arg Glu Ala Val Ser Gin Val 
65 70 75 80 

Leu Lys Gly Tyr Asp Trp Thr Leu Val Pro Met Pro Val Arg Val Asn 

85 90 95 

Gly Ser Ser Lys Asn Lys Pro His Val Lys Arg Pro Met Asn Ala Phe 
100 105 110 

Met Val Trp Ala Gin Ala Ala Arg Arg Lys Leu Ala Asp Gin Tyr Pro 
115 120 125 

His Leu His Asn Ala Glu Leu Ser Lys Thr Leu Gly Lys Leu Trp Arg 
130 135 140 

Leu Leu Asn Glu Ser Glu Lys Arg Pro Phe Val Glu Glu Ala Glu Arg 
145 150 155 160 

Leu Arg Val Gin His Lys Lys Asp His Pro Asp Tyr Lys Tyr Gin Pro 

165 170 175 

Arg Arg Arg Lys Ser Val Lys Asn Gly Gin Ala Glu Ala Glu Glu Ala 
180 185 190 

Thr Glu Gin Thr His lie Ser Pro Asn Ala lie Phe Lys Ala Leu Gin 
195 200 205 

Ala Asp Ser Pro His Ser Ser Ser Gly Met Ser Glu Val His Ser Pro 
210 215 220 

Gly Glu His Ser Gly Gin Ser Gin Gly Pro Pro Thr Pro Pro Thr Thr 
225 230 235 240 

Pro Lys Thr Asp Val Gin Ala Gly Lys Val Asp Leu Lys Arg Glu Gly 

245 250 255 

Arg Pro Leu Ala Glu Gly Gly Arg Gin Pro Pro He Asp Phe Arg Asp 
260 265 270 

Val Asp He Gly Glu Leu Ser Ser Asp Val He Ser Asn He Glu Thr 
275 280 285 

Phe Asp Val Asn Glu Phe Asp Gin Tyr Leu Pro Pro Asn Gly His Pro 
290 295 300 

Gly Val Pro Ala Thr His Gly Gin Val Thr Tyr Thr Gly Ser Tyr Gly 
305 310 315 320 

He Ser Ser Thr Ala Pro Thr Pro Ala Thr Ala Gly His Val Trp Met 

325 330 335 

Ser Lys Gin Gin Ala Pro Pro Pro Pro Pro Gin Gin Pro Pro Gin Ala 
340 345 350 



Pro Gin Ala Pro Gin Ala Pro Pro Gin Gin Gin Ala Pro Pro Gin Gin 
355 360 365 

Pro Gin Ala Pro Gin Gin Gin Gin Ala His Thr Leu Thr Thr Leu Ser 
370 375 380 

Ser Glu Pro Gly Gin Ser Gin Arg Thr His lie Lys Thr Glu Gin Leu 
385 390 395 400 

Ser Pro Ser His Tyr Arg Glu Gin Gin Gin His Ser Pro Gin Gin lie 

405 410 415 

Ser Tyr Ser Pro Phe Asn Leu Pro His Tyr Arg Pro Ser Tyr Pro Pro 
420 425 430 

lie Thr Arg Ser Glu Tyr Asp Tyr Ala Asp His Gin Asn Ser Gly Ser 
435 440 445 

Tyr Tyr Ser His Ala Ala Gly Gin Gly Ser Gly Leu Tyr Ser Thr Phe 
450 455 460 

Thr Tyr Met Asn Pro Ala Gin Arg Pro Met Tyr Thr Pro lie Gly Asp 
465 470 475 480 

Thr Ser Gly Val Pro Ser lie Pro Gin Thr His Ser Pro Gin Asp Trp 

485 490 495 

Glu Gin Pro Val Tyr Thr Gin Val Thr Arg Pro 
500 505 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3923 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

CGGAGCTCGA AACTGACTGG AAACTTCAGT GGCGCGGAGA CTCGCCAGTT TCAACCCCGG 60 

AAACTTTTCT TTGCAGGAGG AGAAGAGAAG GGGTGCAAGC GCCCCCACTT TTGCTCTTTT 12 0 

TCCTCCCCTC CTCCTCCTCT CCAATTCGCC TCCCCCCACT TGGAGCGGGC AGCTGTGAAC 180 

TGGCCACCCC GCGCCTTCCT AAGTGCTCGC CGCGGTAGCC GGCCGACGCG CCAGCTTCCC 240 

CGGGAGCCGC TTGCTCCGCA TCCGGGCAGC CGAGGGGAGA GGAGCCCGCG CCTCGAGTCC 3 00 

CCGAGCCGCC GCGGCTTCTC GCCTTTCCCG GCCACCAGCC CCCTGCCCCG GGCCCGCGTA 3 60 

TGAATCTCCT GGACCCCTTC ATGAAGATGA CCGACGAGCA GGAGAAGGGC CTGTCCGGCG 420 



CCCCCAGCCC 


CACCATGTCC 


GAGGACTCCG 


CGGGCTCGCC 


CTGCCCGTCG 


GGCTCCGGCT 


480 


CGGACACCGA 


GAACACGCGG 


CCCCAGGAGA 


ACACGTTCCC 


CAAGGGCGAG 


CCCGATCTGA 


540 


AGAAGGAGAG 


CGAGGAGGAC 


AAGTTCCCCG 


TGTGCATCCG 


CGAGGCGGTC 


AGCCAGGTGC 


600 


TCAAAGGCTA 


CGACTGGACG 


CTGGTGCCCA 


TGCCGGTGCG 


CGTCAACGGC 


TCCAGCAAGA 


660 


ACAAGCCGCA 


CGTCAAGCGG 


CCCATGAACG 


CCTTCATGGT 


GTGGGCGCAG 


GCGGCGCGCA 


720 


GGAAGCTCGC 


GGACCAGTAC 


CCGCACTTGC 


ACAACGCCGA 


GCTCAGCAAG 


ACGCTGGGCA 


780 


AGCTCTGGAG 


ACTTCTGAAC 


GAGAGCGAGA 


AGCGGCCCTT 


CGTGGAGGAG 


GCGGAGCGGC 


840 


TGCGCGTGCA 


GCACAAGAAG 


GACCACCCGG 


ATTACAAGTA 


CCAGCCGCGG 


CGGAGGAAGT 


900 


CGGTGAAGAA 


CGGGCAGGCG 


GAGGCAGAGG 


AGGCCACGGA 


GCAGACGCAC 


ATCTCCCCCA 


960 


ACGCCATCTT 


CAAGGCGCTG 


CAGGCCGACT 


CGCCACACTC 


CTCCTCCGGC 


ATGAGCGAGG 


1020 


. =i TGCACTCCCC 


CGGCGAGCAC 


TCGGGGCAAT 


CCCAGGGCCC 


ACCGACCCCA 


CCCACCACCC 


1080 


% CCAAAACCGA 


CGTGCAGCCG 


GGCAAGGCTG 


ACCTGAAGCG 


AGAGGGGCGC 


CCCTTGCCAG 


1140 


jf AGGGGGGCAG 


ACAGCCCCCT 


ATCGACTTCC 


GCGACGTGGA 


CATCGGCGAG 


CTGAGCAGCG 


1200 


O ACGTCATCTC 


CAACATCGAG 


ACCTTCGATG 


TCAACGAGTT 


TGACCAGTAC 


CTGCCGCCCA 


1260 


ACGGCCACCC 


GGGGGTGCCG 


GCCACGCACG 


GCCAGGTCAC 


CT AC AC GGGC 


AGCTACGGCA 


1320 


□ TCAGCAGCAC 


CGCGGCCACC 


CCGGCGAGCG 


CGGGCCACGT 


GTGGATGTCC 


AAGCAGCAGG 


1380 


rU CGCCGCCGCC 

S3 


ACCCCCGCAG 


CAGCCCCCAC 


AGGCCCCGCC 


GGCCCCGCAG 


GCGCCCCCGC 


1440 


p AGCCGCAGGC 


GGCGCCCCCA 


CAGCAGCCGG 


CGGCACCCCC 


GCAGCAGCCA 


CAGGCGCACA 


1500 


CGCTGACCAC 


GCTGAGCAGC 


GAGCCGGGCC 


AGTCCCAGCG 


AACGCACATC 


AAGACGGAGC 


1560 


AGCTGAGCCC 


CAGCCACTAC 


AGCGAGCAGC 


AGCAGCACTC 


GCCCCAACAG 


ATCGCCTACA 


1620 


GCCCCTTCAA 


CCTCCCACAC 


TACAGCCCCT 


CCTACCCGCC 


CATCACCCGC 


TCACAGTACG 


1680 


ACTACACCGA 


CCACCAGAAC 


TCCAGCTCCT 


ACTACAGCCA 


CGCGGCAGGC 


CAGGGCACCG 


1740 


GCCTCTACTC 


CACCTTCACC 


TACATGAACC 


CCGCTCAGCG 


CCCCATGTAC 


ACCCCCATCG 


1800 


CCGACACCTC 


TGGGGTCCCT 


TCCATCCCGC 


AGACCCACAG 


CCCCCAGCAC 


TGGGAACAAC 


1860 


CCGTCTACAC 


ACAGCTCACT 


CGACCTTGAG 


GAGGCCTCCC 


ACGAAGGGCG 


ACGATGGCCG 


1920 


AGATGATCCT 


AAAAATAACC 


GAAGAAAGAG* AGGACCAACC 


AGAATTCCCT 


TTGGACATTT 


1980 


GTGTTTTTTT 


GTTTTTTTAT 


TTTGTTTTGT 


TTTTTCTTCT 


TCTTCTTCTT 


CCTTAAAGAC 


2040 


ATTTAAGCTA 


AAGGCAACTC 


GTACCCAAAT 


TTCCAAGACA 


CAAACATGAC 


CTATCCAAGC 


2100 



GCATTACCCA 


CTTGTGGCCA 


ATCAGTGGCC 


AGGCCAACCT 


TGGCTAAATG 


GAGCAGCGAA 


2160 


ATCAACGAGA 


AACTGGACTT 


TTTAAACCCT 


CTTCAGAGCA 


AGCGTGGAGG 


ATGATGGAGA 


2220 


ATCGTGTGAT 


CAGTGTGCTA 


AATCTCTCTG 


CCTGTTTGGA 


CTTTGTAATT 


ATTTTTTTAG 


2280 


CAGTAATTAA 


AGAAAAAAGT 


CCTCTGTGAG 


GAATATTCTC 


TATTTTAAAT 


ATTTTTAGTA 


2340 


TGTACTGTGT 


ATGATTCATT 


ACCATTTTGA 


GGGGATTTAT 


ACATATTTTT 


AGATAAAATT 


2400 


AAATGCTCTT 


ATTTTTCCAA 


CAGCTAAACT 


ACTCTTAGTT 


GAACAGTGTG 


CCCTAGCTTT 


2460 


TCTTGCAACC 


AGAGTATTTT 


TGTACAGATT 


TGCTTTCTCT 


TACAAAAAGA 


AAAAAAAAAT 


2520 


CCTGTTGTAT 


TAACATTTAA 


AAACAGAATT 


GTGTTATGTG 


ATCAGTTTTG 


GGGGTTAACT 


2580 


TTGCTTAATT 


CCTCAGGCTT 


TGCGATTTAA 


GGAGGAGCTG 


CCTTAAAAAA 


AAATAAAGGC 


2640 


CTTATTTTGC 


AATTATGGGA 


GTAAACAATA 


GTCTAGAGAA 


GCATTTGGTA 


AGCTTTATGA 


2700 


TATATATATT 


TTTTAAAGAA 


GAGAAAAACA 


CCTTGAGCCT 


TAAAACGGTG 


CTGCTGGGAA 


2760 


ACATTTGCAC 


TCTTTTAGTG 


CATTTCCTCC 


TGCCTTTGCT 


TGTTCACTGC 


AGTCTTAAGA 


2820 


AAGAGGTAAA 


AGGCAAGCAA 


AGGAGATGAA 


ATCTGTTCTG 


GGAATGTTTC 


AGCAGCCAAT 


2880 


AAGTGCCCGA 


GCACACTGCC 


CCCGGTTGCC 


TGCCTGGGCC 


CCATGTGGAA 


GGCAGATGCC 


2940 


TGCTCGCTCT 


GTCACCTGTG 


CCTCTCAGAA 


CACCAGCAGT 


TAACCTTCAA 


GACATTCCAC 


3000 


TTGCTAAAAT 


TATTTATTTT 


GTAAGGAGAG 


GTTTTAATTA 


AAACAAAAAA 


AAATTCTTTT 


3060 


TTTTTTTTTT 


TTTTCCAATT 


TTACCTTCTT 


TAAAATAGGT 


TGTTGGAGCT 


TTCCTCAAAG 


3120 


GGTATGGTCA 


TCTGTTGTTA 


AATTATGTTC 


TTAACTGTAA 


CCAGTTTTTT 


TTTATTTATC 


3180 


TCTTTAATCT 


TTTTTATTAT 


TAAAAGCAAG 


TTTCTTTGTA 


TTCCTCACCC 


TAGATTTGTA 


3240 


TAAATGCCTT 


TTTGTCCATC 


CCTTTTTTCT 


TTGTTGTTTT 


TGTTGAAAAC 


AAACTGGAAA 


3300 


CTTGTTTCTT 


TTTTTGTATA 


AATGAGAGAT 


TGCAAATGTA 


GTGTATCACT 


GAGTCATTTG 


3360 


CAGTGTTTTC 


TGCCACAGAC 


CTTTGGGCTG 


CCTTATATTG 


TGTGTGTGTG 


TGGGTGTGTG 


3420 


TGTGTTTTGA 


CACAAAAACA 


ATGCAAGCAT 


GTGTCATCCA 


TATTTCTCTA 


CATCTTCTCT 


3480 


TGGAGTGAGG 


GAGGCTACCT 


GGAGGGGATC 


AGCCCACTGA 


CAGACCTTAA 


TCTTAATTAC 


3540 


TGCTGTGGCT 


AGAGAGTTTG 


AGGATTGCTT 


TTTAAAAAAG 


ACAGCAAACT 


TTTTTTTTTA 


3600 


TTTAAAAAAA 


GATATATTAA 


CAGTTTTAGA 


AGTCAGTAGA 


ATAAAATCTT 


AAAGCACTCA 


3660 


TAATATGGCA 


TCCTTCAATT 


TCTGTATAAA 


AGCAGATCTT 


TTTAAAAAAG 


ATACTTCTGT 


3720 


AACTTAAGAA 


ACCTGGCATT 


TAAATCATAT 


TTTGTCTTTA 


GGTAAAAGCT 


TTGGTTTGTG 


3780 



TTCGTGTTTT GTTTGTTTCA CTTGTTTCCC TCCCAGCCCC AAACCTTTTG TTCTCTCCGT 
GAAACTTACC TTTCCCTTTT TCTTTCTCTT TTTTTTTTTG TATATTATTG TTTACAATAA 
ATATACATTG CATTAAAAAG AAA 



3840 
3900 
3923 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 509 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

.Met Asn Leu Leu Asp Pro Phe Met Lys Met Thr Asp Glu Gin Glu Lys 

\ 1 5 10 15 

l Gly Leu Ser Gly Ala Pro Ser Pro Thr Met Ser Glu Asp Ser Ala Gly 

20 25 30 

~ Ser Pro Cys Pro Ser Gly Ser Gly Ser Asp Thr Glu Asn Thr Arg Pro 
=: 35 40 45 

1 Gin Glu Asn Thr Phe Pro Lys Gly Glu Pro Asp Leu Lys Lys Glu Ser 
1 50 55 60 

j Glu Glu Asp Lys Phe Pro Val Cys He Arg Glu Ala Val Ser Gin Val 
.65 70 75 80 

I Leu Lys Gly Tyr Asp Trp Thr Leu Val Pro Met Pro Val Arg Val Asn 
! 85 90 95 

*Gly Ser Ser Lys Asn Lys Pro His Val Lys Arg Pro Met Asn Ala Phe 
I 100 105 110 

IMet Val Trp Ala Gin Ala Ala Arg Arg Lys Leu Ala Asp Gin Tyr Pro 

115 120 125 

His Leu His Asn Ala Glu Leu Ser Lys Thr Leu Gly Lys Leu Trp Arg 

130 135 140 

Leu Leu Asn Glu Ser Glu Lys Arg Pro Phe Val Glu Glu Ala Glu Arg 
145 150 155 160 

Leu Arg Val Gin His Lys Lys Asp His Pro Asp Tyr Lys Tyr Gin Pro 

165 170 175 

Arg Arg Arg Lys Ser Val Lys Asn Gly Gin Ala Glu Ala Glu Glu Ala 

180 185 190 

Thr Glu Gin Thr His He Ser Pro Asn Ala He Phe Lys Ala Leu Gin 

195 200 205 

Ala Asp Ser Pro His Ser Ser Ser Gly Met Ser Glu Val His Ser Pro 

210 215 220 

Gly Glu His Ser Gly Gin Ser Gin Gly Pro Pro Thr Pro Pro Thr Thr 
225 230 235 240 

Pro Lys Thr Asp Val Gin Pro Gly Lys Ala Asp Leu Lys Arg Glu Gly 

245 250 255 

Arg Pro Leu Pro Glu Gly Gly Arg Gin Pro Pro He Asp Phe Arg Asp 

260 265 270 

Val Asp lie Gly Glu Leu Ser Ser Asp Val He Ser Asn He Glu Thr 
275 280 285 
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